Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis
نویسندگان
چکیده
To preserve shape-invariance when pitch or time-scale modifying sinusoidally modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated excitation points. Previous methods achieve this by estimating excitation phases at synthesis frame boundaries, disregarding the frequency modulation that may occur between the frame boundary and the nearest modified excitation point. This approximation can produce a significant mis-alignment of the excitation phases, leading to distortion of the temporal structure of the synthetic speech. In this paper, a shape-invariant technique is proposed which aligns the excitation phases at excitation points, whilst allowing for variations in the frequency of the sinusoidal components.
منابع مشابه
Shape invariant time-scale modification of speech using a harmonic model
A new and simple approach to shape invariant timescale modi cation of speech is presented. The method, based upon a harmonic coding of each speech frame, operates entirely within the original sinusoidal model [3] and makes no use of \pitch-pulse onset times" used by conventional algorithms. Instead, phase coherence, and thus shape invariance, are ensured by exploiting the harmonic relation exis...
متن کاملSource-filter models for time-scale pitch-scale modification of speech
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft’s Whistler system, which is based on concatenative synthesis. Both methods are based on a sourcefilter model, one of them using LPC parameters and the other one using cepstral parameters. The proposed methods achieve high quality prosody modification...
متن کاملA hybrid method oriented to concatenative text-to-speech synthesis
In this paper we present a speech synthesis method for diphonebased text-to-speech systems. Its main goal is to achieve prosodic modifications that result in more natural-sounding synthetic speech. This improvement is especially useful for emotional speech synthesis, which requires high-quality prosodic modification. We present a hybrid method based on TD-PSOLA and the harmonic plus noise model...
متن کاملMaximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کاملDevelopment of Concatenative Syllable based Text to Speech Synthesis System for Tamil
This paper addresses the problem of improving the intelligibility of the synthesized speech in Tamil TTS synthesis system. The human speech is artificially generated by Speech synthesis. The normal language text will be automatically converted into speech using Text-to-speech (TTS) system. This paper deals with a corpus-driven Tamil TTS system based on the concatenative synthesis approach. Conc...
متن کامل